SSR molecular marker developments and genetic diversity analysis of Zanthoxylum nitidum (Roxb.) DC

Zanthoxylum nitidum (Roxb.) is a commonly used traditional Chinese medicine. However, the collection and protection of wild germplasm resources of Z. nitidum are still insufficient, and there is limited research on its genetic diversity and fingerprint. In the present study, 15 simple sequence repeat (SSR) markers were developed by genotyping based on multiplexed shotgun sequencing. The genetic diversity of 51 populations (142 individuals) of Z. nitidum was evaluated using these 15 SSRs. A total of 245 alleles (Na) were detected, with an average value of 16.333, and the average polymorphism information content was 0.756. The genetic distance among 51 populations was 0.164~1.000, with an average of 0.659. Analysis of molecular variance showed low genetic differentiation (40%) and high genetic differentiation (60%) between populations and individuals, respectively. The genetic differentiation coefficient (Fst) of the population was 0.338, indicating that 66.2% of the genetic variation occurred within the population, and the gene flow (Nm) was 0.636, demonstrating that the gene exchange between populations was low. Clustering analysis revealed that the genetic similarity coefficient was 0.30, dividing the 51 populations into 4 groups of 2, 17, 3, and 29 populations. There was no specific relationship between geographical location differences and genetic distance. The genetic diversity level of Z. nitidum is relatively high, and our results provide a theoretical basis for the rapid identification of Z. nitidum germplasm resources and variety selection.

Zanthoxylum nitidum (Roxb.)DC., a notable medicinal plant, belongs to the genus Zanthoxylum in the Rutaceae family, the root used as traditional medicine is named Liang-Mian-Zhen 1 .Z. nitidum was first recorded in "Shennong Materia Medica Classic" during the Qin and Han dynasties under the name "Manjiao" 2 .It mainly grows in Guangxi, Guangdong, and other locations in China 3 .The roots mainly contain alkaloids 4,5 , sesquiterpenoids 6 , coumarins 7 , lignans 8 , and other components that have various pharmacological activities, such as antiinflammatory 9,10 , antibacterial 11,12 , anticancer 13 and analgesic 14 activities.Z. nitidum is not only the main raw material of more than 60 famous traditional Chinese patent medicines and simple preparations, such as Sanjiu Weitai Granules, LiangMianZhen Analgesic Tablets, and Dieda Wanhua Oil, but its extracts are widely used in toothpaste, soap, shampoo and other daily personal products 15,16 .
Zanthoxylum nitidum can be subdivided into Z.nitidum var.nitidum and Z. nitidum var.tomentosum according to whether each section of the plant has short, rough hairs, particularly on both sides of the leaves.Furthermore, Z. nitidum var.nitidum is separated into three types based on the number of thorns on branches and leaf axes, as well as the size and thickness of leaflets 3,17 .Due to the high demand for Z. nitidum in the market, wild resources have been plundered recklessly for a long time, and have become increasingly scarce 17,18 .The fundamental measure to solve the resource crisis is to transform the source from wild to artificially planted.However, in the process of introducing and domesticating wild Z. nitidum germplasm, it is difficult to accurately distinguish resources with such similar traits using traditional morphological identification methods.Hence, there is a serious shortage of excellent germplasm basic materials for artificial cultivation, and the unclear phenotypic characteristics and genetic background of cultivated germplasm are unclear, which seriously limits the development of related industries for Z. nitidum.Some of the advantages of simple sequence repeat (SSR) molecular markers are rich polymorphism, wide distribution, easy operation, and high sensitivity 19 .SSR molecular markers have been widely used in genetic diversity

Plant materials and DNA extraction
From September to October 2022, a total of 142 Z. nitidum var.nitidum and Z. nitidum var.tomentosum individuals from 51 populations in Guangdong and Guangxi were collected, including 16 germplasms from 7 artificially cultivated populations and 126 germplasms from 44 wild populations.The germplasm numbers and source information are shown in Table S1.Fresh young leaves were collected and stored at − 80 °C for DNA extraction.Total genomic DNA was extracted from the frozen leaves following the instructions of Magnetic Bead Method using a Plant Genomic Extraction Kit (NanoMagBio, Wuhan, China).The concentration and purity of DNA were determined by using an ultramicro spectrophotometer (NanoDrop ONE, Thermo Fisher, USA).

MSG library construction and sequencing
Three germplasms with significant phenotypic differences (GX0826-1, bl0915-1, 3-1) were selected for constructing the library.We prepared the sequencing library using the MSG method proposed by Andolfatto et al. 23 .We purified the libraries and selected DNA fragments in the 400 bp size range using AMPure XP beads (Beckman Coulter, Inc., USA), and then amplified them using PCR for 14 cycles.The Illumina NovaSeq platform (Illumina, Inc., San Diego, USA) was used to obtain raw sequence data.The original data were filtered using the sliding window analysis method of fastp (v0.20.0) 24, and the sequences were integrated using FLASH (v1.2.11) 25 to obtain high-quality data.The sequencing data have been deposited in China National GeneBank (CNGB) with project accession number CNP0004226 (https:// db.cngb.org/ search/ proje ct/ CNP00 04226/).

SSR primer pair design
The Microsatellite Identification Tool (MISA) (http:// pgrc.ipk-gater sleben.de/ misa/) was used to search for SSR loci in all high-quality sequences.Cd-hit software 26 was used to cluster the sequences.The Perl program was used to analyze the clustering results and evaluate polymorphisms 27 .Finally, PCR primer pairs flanking the SSR repeats were designed using primer 3 (v2.3.6) 28.

Fluorescence capillary electrophoresis detection
Fluorescent primers were obtained from Wuhan Tianyi Huiyuan Biotechnology Co., Ltd.(Wuhan, China), and the fluorescent dyes used were FAM, HEX, and TAMRA.One microliter of fluorescent PCR product, 0.5 μL of GeneScan™500 LIZ, and 8.5 μL of Hi-Di™ formamide were added to the upper plate, centrifuged, denatured (95 °C for 5 min), and cooled.Finally, the samples were analyzed using an ABI3731XL sequence analyzer (AppliedBiosystem, USA).

Genetic diversity analysis
GenAlEx (v6.501) software was used to calculate genetic diversity indicators 29 , including the number of observed alleles (Na), effective alleles (Ne), observed heterozygosity (Ho), expected heterozygosity (He), fixation index (F), Shannon's information index (I), genetic differentiation coefficient (Fst), gene flow (Nm) and analysis of molecular variance (AMOVA).Linkage Disequilibrium between the SSR loci was carried out using the SHEsis plus 30 .The genetic distance among populations was calculated using Powermarker (V3.25) 31 .The genetic structure of the germplasm was analyzed using STRU CTU RE software (v2.3.4) 32 , and cluster analysis between populations was based on unweighted pair group with arithmetic average (UPGMA).

Plant collecting permit declaration
The plant materials used in this article did not involve disputes.We hereby declare that all of the plant materials (Z.nitidum var.nitidum and Z. nitidum var.tomentosum) were collected in compliance with institutional,

Result analysis SSR quantity analysis
MSG data of the three samples (GX0826-1, bl0915-1, and 3-1) were 8.63 Gb, 6.66 Gb, and 9.35 Gb, respectively.After filtering, high-quality data were generated, which were 1.30 Gb, 0.93 Gb, and 1.20 Gb, respectively.Furthermore, 2,903,693 pairs of read pairs were merged from a total of 12,014,757 pairs.Using MISA software, 261,267 SSR loci were found, distributed among 227,023 unigenes.The frequency of SSR occurrence (the proportion of sequences containing SSR loci to all sequences) was 7.82%.The average distance (total sequence length divided by the total number of SSRs) was 2.73 kb.Among them, there were 30,512 unigenes containing more than one SSR locus, and 29,314 SSRs were present in composite form (Table 1).

SSR marker development
On the basis of the identified SSR loci, 768 primer pairs were designed (Table S2).Among the 192 randomly selected SSR markers (Table S3) screened using 10 selected populations, 39 SSR markers with polymorphisms were preliminarily screened, with a polymorphism rate of 20.31%.Then, 15 randomly selected SSR markers were used for genetic diversity analysis of 51 Z. nitidum populations (Table 3).

SSR marker detection
A total of 245 alleles (Na) from 15 developed SSR markers were amplified from the 51 populations (142 individuals).The total effective alleles (Ne) were 85.828, with an average of 5.722, and the proportion of effective alleles was 35.03%.Shannon's information index (I) ranged from 1.313 to 2.994, with an average value of 1.984.The polymorphism information content (PIC) ranged from 0.531 to 0.922, with an average of 0.756.All loci were not conforming to the Hardy-Weinberg equilibrium (Table 4).The linkage disequilibrium distribution pattern based on R 2 values between SSR loci is shown in Fig. 1.R 2 ranged between 0.01 and 0.39, indicating a relatively low degree of linkage.These results show that the 15 loci those were chosen have a high degree of polymorphism and can be utilized for further research on Z. nitidum's genetic diversity.

Population genetic diversity
Statistical analysis showed that the maximum genetic distance among 51 populations was 1.000, the minimum was 0.164, and the average was 0.659 (Table S4).The results obtained for population-level indices of genetic diversity in each population are shown in Table 5.The average values of Na, Ne, I, Ho, He and F at the population level were 2.220, 1.964, 0.619, 0.478, 0.380, and − 0.268, respectively.Taken together, these results indicate that POP15 and POP1 exhibit the highest and lowest levels of genetic diversity, respectively.

Genetic differentiation
According to AMOVA, 40% of the total genetic variation originated from variability among the populations, whereas 60% was attributed to within-individual differences (Table 6).F-statistical analysis revealed that the genetic differentiation coefficient (Fst) of the population was 0.338, and the gene flow (Nm) was 0.636 (Table S5).www.nature.com/scientificreports/

Phylogeny tree
According to Nei's genetic similarity coefficient, the UPGMA method was used to construct a phylogenetic tree.The genetic similarity coefficient variation of 51 populations ranged from 0 to 0.44, and when the similarity coefficient was 0.30, they could be clustered into 4 clusters (Fig. 2).Cluster I contained 2 populations, both of which have been identified as Z. nitidum var.tomentosum.Cluster II contained 17 populations, all of which have been identified as type 1 of Z. nitidum var.nitidum.Cluster III included 3 populations from Guangdong, all of which have been identified as type 3 of Z. nitidum var.nitidum.Cluster IV contained 29 populations, including all 7 cultivated populations, all of which have been identified as type 2 of Z. nitidum var.nitidum.www.nature.com/scientificreports/

Discussion
SSRs, as important molecular markers, have been widely used in genetic diversity evaluation, genetic map, construction, and finger mapping in medicinal plants, such as for the medicinal plants Aristolochia delavayi 33 , Andrographis paniculata 34 , and Paris polyphylla 35 .However, genetic research on Z. nitidum has been hampered due to limited genetic information.MSGs were used for high-throughput discovery of SSRs 23 .It was confirmed that MSG is similar in essence to restriction site-associated DNA (RAD) sequencing 36 and whole-genome resequencing (WGD) 37 , and it is more effective and flexible for detecting SSRs.In this study, 261,267 SSR loci by MSG were discovered.
The number of Na, Ne, Ho, He, F, I, and PIC of primers are important indicators for measuring polymorphism of SSR markers.When PIC > 0.5, it indicates that the primer has a high degree of polymorphism 38 .In this study, we designed 768 SSRs based on MSG data, and screened out specific markers of Z. nitidum form them.To evaluate the effectiveness of the designed SSR primers, we randomly selected 192 SSRs and conducted two rounds of screening based on 10 germplasms with significant phenotypic differences.Twenty-six SSRs were found to be effective.Fifteen SSRs were selected from 26 SSRs for genetic diversity analysis on 142 individuals.The PIC of these 15 SSRs ranged from 0.531 to 0.922, with an average of 0.756.These 15 SSRs have high polymorphism and can effectively reveal the genetic diversity of Z. nitidum.This is the first time SSR markers have been used for genetic diversity analysis of Z. nitidum populations; at the same time, our results confirm that SSR markers have the advantages of a large number of loci and high identification efficiency.
Among the indicators of population genetic diversity, expected heterozygosity (He) 39 and Shannon's information index (I) 40 are important.The larger their values are, the richer the genetic diversity of the population is.This study found that the He of these 51 populations ranged from 0.133 to 0.600, with an average of 0.380; the I ranged from 0.185 to 1.043, with an average of 0.619.These data indicate that the 51 populations have rich genetic diversity.In general, Fst > 0.25 indicates a high level of genetic differentiation among populations 41 ; Nm < 1 indicates that the genetic differentiation among populations is caused by migration or genetic drift 42 .In this study, the Fst of 51 populations was 0.338, and the Nm was 0.636, which indicated that the genetic differentiation among populations of Z. nitidum is at a high level.Additionally, the genetic differentiation between populations was caused by migration or genetic drift, which is consistent with the conclusions reached by previous researchers using ISSR molecular markers 43 , and similar to the results of genetic differentiation studies of other Zanthoxylum plants 44,45 .
Conducting variety identification by combining molecular marker technology with phenotypic analysis is reliable and convenient 46,47 .In our study, we constructed a UPGMA clustering tree using the 15 SSRs based on Nei's genetic similarity coefficient.Fifty-one populations could be clearly clustered into 4 clusters.Among them, the first cluster was identified as Z. nitidum var.tomentosum, indicating distant genetic relationship between Z. nitidum var.tomentosum and the rest of the germplasm with relatively clear separation.The second cluster was identified as Z. nitidum var.nitidum (type 1), indicating a distant genetic relationship between type 1 and the other two types.The third cluster was identified as Z. nitidum var.nitidum (type 3), all of which originated from Guangdong, showing obvious regional characteristics.The cluster was identified as Z. nitidum var.nitidum (type 2), with the closest genetic relationship to type 3, and included 7 artificially cultivated populations, indicating a decrease in genetic diversity after transitioning from wild to domestic.The conclusion of the genetic relationship between the four types of Z. nitidum through cluster analysis is consistent with traditional morphological identification methods and proves the effectiveness of the SSR molecular markers selected in this study for identifying Z. nitidum germplasm.

Conclusion
In this study, MSG data were used to develop SSRs, and 261,267 SSRs were detected.Fifteen SSRs were powerful molecular markers for genetic diversity analysis and variety identification in Z. nitidum.Our data will be useful for germplasm identification, genetic improvement, and variety selection.

Figure 1 .
Figure 1.Linkage disequilibrium distribution pattern based on R 2 values between SSR loci.The color of the box ranges from white to red, representing the level of linkage disequilibrium from low to high.

Table 1 .
SSR search results based on MSG data.

Table 3 .
List of 15 SSR markers used in the present study.

Table 4 .
Genetic diversity parameters of 15 SSR markers.The P-value is the testing result of the Hardy-Weinberg equilibrium, and the smaller the p-value, the loci that do not conform to expectations of Hardy-Weinberg equilibrium.

Table 6 .
Analysis of the molecular variance (AMOVA) of the 142 individuals.df degree of freedom, SS Total variance, MS Mean square deviation, Est.Var.Estimated difference value.